Semantic speech recognition in the Basque context Part I: cross-lingual approaches

نویسندگان

  • Nora Barroso
  • Karmele López de Ipiña
  • Odei Barroso
  • Aitzol Ezeiza
  • Carmen Hernández
  • Manuel Graña
چکیده

This work, divided into Part I and II, describes the development of GorUP a Semantic Speech Recognition System in the Basque context. Part I analyses crosslingual approaches oriented to under-resourced languages and Part II the development of the Language Identification system. During the development, data optimization methods and Soft Computing methodologies oriented to complex environment are used in order to overcome the lack of resources. Moreover, in this context three languages coexist: French, Spanish and Basque. Indeed our main goal is the development of robust Automatic Speech Recognition (ASR) systems for Basque, but all language variability has to be analyzed. In this regard, Basque speakers mix during the speech not only sounds but also words of the three languages which results in a strong presence of cross-lingual elements. Besides, Basque is an agglutinative language with a special morpho-syntactic structure inside the words that may lead N. Barroso · O. Barroso Irunweb Enterprise, Auzolan 2B – 2, Irun, 20303, Basque Country, Spain N. Barroso e-mail: [email protected] O. Barroso e-mail: [email protected] K. López de Ipiña ( ) · A. Ezeiza · C. Hernández · M. Graña Grupo de Inteligencia Computacional, University of the Basque Country UPV/EHU, Plaza de Europa 1, 20008 Donostia, Spain e-mail: [email protected] A. Ezeiza e-mail: [email protected] C. Hernández e-mail: [email protected] M. Graña e-mail: [email protected] to intractable vocabularies. Nowadays, our work is oriented to Information Retrieval and mainly to small internet massmedia. In these cases the available resources for Basque in general, and for this task in particular, are very few and complex to process because of the noisy environment. Thus, the methods employed in this development (ontology-based approach or cross-lingual methodologies oriented to profit from more powerful languages) could suit the requirements of many under-resourced languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Lingual Approaches: The Basque Case

Cross-lingual speech recognition could be relevant for Multilingual Automatic Speech Recognition (ASR) systems which work with under-resourced languages and appropriately equipped languages. In the Basque Country, the interest on Multilingual Automatic Speech Recognition systems comes from the fact that there are three official languages in use (Basque, Spanish, and French). . Multilingual Basq...

متن کامل

Acoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context

The development of Large Vocabulary Continuous Speech Recognition systems involves issues as: Acoustic Phonetic Decoding, Language Modelling or the development of appropriated Language Resources. In the state of the art, new techniques for reusing Language Resources of more resourced related languages is becoming of great interest, and there is also a growing interest on Multilingual systems. T...

متن کامل

Phonetic and Prosodic Aspects in the Cross-lingual Pronunciation Tutoring

Computer-assisted pronunciation tutoring (CAPT) methods have been well-established in research and education. Common system approaches include the phonetic quality assessment, highlight problematic sections in the speech signal and usually rely on automatic speech recognition (ASR) regarding the target language L2. The contribution deals with the audiovisual CAPT system AzAR. An extensive feedb...

متن کامل

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages

This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part-of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is comm...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • I. J. Speech Technology

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2012